A Probabilistic Neighbourhood Translation Approach for Non-standard Text Categorisation

نویسنده

  • Ata Kabán
چکیده

The need for non-standard text categorisation, i.e. based on some subtle criterion other than topics, may arise in various circumstances. In this study, we consider written responses to a standardised psychometric test for determining the personality trait of human subjects. A number of state-of-the-art text classifiers that having been very successful in standard topic-based classification problems turn out to perform poorly in this task. Here we propose a very simple probabilistic approach, which is able to achieve accurate predictions, and demonstrates this peculiar problem is still solvable by simple statistical text representation means. We then extend this approach to include a latent variable, in order to obtain additional explanatory information beyond a black-box prediction.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Improving Biomedical Text Categorisation with NLP

Background: Text categorisation has been used in bioinformatics to help identify documents containing protein-protein interactions. Standard text categorisation methods have used the bag-of-words approach with little input from NLP. While this has proved effective in the past, there is some evidence that the techniques are not adequate in some biological domains. Here we examine how chunking, n...

متن کامل

Reading and Assessing the City / Neighborhood FabricAs a Text. Case Study: Sar-Tapulah Historical Neighbourhood inSanandaj

From a linguistic point of view, the city can be seen as a text, consisting of different components and structures being related to each other beyond a sentence. Looking at the city from this point of view, what establishes a syntactic relationship and cohesion and coherence of the components of the city as a common language is called the syntax of the city. Linguistic study of the text of the ...

متن کامل

Translation by Text Categorisation: Medical Image Retrieval in ImageCLEFmed 2006

We present the fusion of simple retrieval strategies with thesaural resources to perform document and query translation by text categorisation for cross–language retrieval in a collection of medical images with case notes. The collection includes documents in French, English and German. The fusion of visual and textual content is also treated. Unlike most automatic categorisation systems our ap...

متن کامل

Mapping Semantic Knowledge for Unsupervised Text Categorisation

Text categorisation is challenging, due to the complex structure with heterogeneous, changing topics in documents. The performance of text categorisation relies on the quality of samples, effectiveness of document features, and the topic coverage of categories, depending on the employing strategies; supervised or unsupervised; single labelled or multi-labelled. Attempting to deal with these rel...

متن کامل

SearchSleuth: The Conceptual Neighbourhood of an Web Query

This paper presents SearchSleuth, a program developed to experiment with a form of automatic local analysis that extends the standard Web search interface to include a conceptual neighbourhood focused on a formal concept derived from the query. The conceptual neighbourhood is displayed with upper neighbours representative of a generalisation operation, and lower neighbours representative of a s...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2008